Multivariate analysis of variegated expression in Neurons: A strategy for unbiased localization of gene function to candidate brain regions in larval zebrafish

Behavioral screens in model organisms have greatly facilitated the identification of genes and genetic pathways that regulate defined behaviors. Identifying the neural circuitry via which specific genes function to modify behavior remains a significant challenge in the field. Tissue- and cell type-specific knockout, knockdown, and rescue experiments serve this purpose, yet in zebrafish screening through dozens of candidate cell-type-specific and brain-region specific driver lines for their ability to rescue a mutant phenotype remains a bottleneck. Here we report on an alternative strategy that takes advantage of the variegation often present in Gal4-driven UAS lines to express a rescue construct in a neuronal tissue-specific and variegated manner. We developed and validated a computational pipeline that identifies specific brain regions where expression levels of the variegated rescue construct correlate with rescue of a mutant phenotype, indicating that gene expression levels in these regions may causally influence behavior. We termed this unbiased correlative approach Multivariate Analysis of Variegated Expression in Neurons (MAVEN). The MAVEN strategy advances the user’s capacity to quickly identify candidate brain regions where gene function may be relevant to a behavioral phenotype. This allows the user to skip or greatly reduce screening for rescue and proceed to experimental validation of candidate brain regions via genetically targeted approaches. MAVEN thus facilitates identification of brain regions in which specific genes function to regulate larval zebrafish behavior.

1 Perform the mating cross for your experiment. Here, we describe an example cross for Gal4 x UAS-induced rescue of the loss-of-function phenotype. Depending on your strategy, your cross may differ. Cross fish that are heterozygous (or homozygous mutant, if available) for a loss-of-function mutation in the gene of interest. At least one fish should carry a Gal4 for the cell type of interest, and at least one should carry a UAS construct that expresses a tagged version of the target gene.
We kept larvae from different individual mating crosses separate during raising, behavior, and analysis, in case patterns of variegation were substantially different between our mating pairs. Ultimately, when we did not observe major differences between pairs, we analyzed all the data from our different mating pairs together.
Assay phenotype of interest 2 Assay your phenotype of interest in your larvae (See Figure 2 of the associated paper). Sort the larvae according to their phenotype and keep larvae with different phenotypes separate.

Age of larvae:
While 6 dpf is preferable for registration, we have also successfully registered the brains of 5 dpf larvae. We have not attempted registering brains at any other age. If you assay phenotypes earlier, we recommend allowing larvae to develop to 6 dpf before fixing them. However, this assumes that Gal4 x UAS expression patterns at 6 dpf will reliably report expression patterns earlier in development. Because this assumption may not hold, we strongly recommend finding a way to measure phenotypes as close to 6 dpf as possible.
Separating larvae into two phenotypic groups: In our case, phenotypes did not naturally separate into a bimodal distribution for easy division into two groups.
Instead, phenotypes spread along a wide continuum. Larvae exhibited decision-making bias all the way from 100% reoreintation-biased to 100% escape-biased and at every point in between. We chose to collect only larvae on the relatively extreme ends of the spectrum for our analysis. For the escape-biased group, we collected larvae that performed 75% or greater escapes, and for the reorientation-biased group, we collected larvae that performed 75% or greater reorientations. While this somewhat limited our throughput because we discarded a substantial fraction of our larvae, it allowed us to collect two groups with a major difference in their behavior. You will have to use your own judgement to define the groups that you will use for comparison.
Precaution on gain-of-function overexpression-based strategies: It is important to note that gain-of-function phenotypes can result from expressing a gene at the wrong developmental time or cell type or in unusual abundance. As such, overexpression-induced gain-of-function phenotypes may not necessarily be informative as to the gene's endogenous function. For our CaSR example, we were confident that our gain-of-function phenotype was related to the decision-making function of CaSR for several reasons. First, it was qualitatively the opposite of the loss-of-function phenotype. Second, it was induced by the same manipulation (overexpression in neurons) that rescued the loss-of-function phenotype. Third, it was the same as the effect of applying a pharmacological agonist of CaSR to wild-type larvae (Jain et al. 2018), which naturally express CaSR in the endogenous pattern. Moreover, we confirmed our general results obtained using the gain-of-function strategy with loss-of-function rescue data at every step of our analysis and validation. In summary, the larva collection strategy used for MAVEN should be carefully considered. The gainof-function strategy was available to us because CaSR exerts bidirectional control of our phenotype of interest. It is likely that most who wish to apply this technique will be best served by the loss-of-function rescue strategy.

Storage of Larve (Optional)
After phenotyping but before fixation, larvae can be stored in individual wells in 100% methanol for 1-2 days while phenotypic analysis steps are completed. It is entirely possible to skip the methanol step and proceed straight to fixation if desired. Larvae should not be allowed to dry out at any point.

Labeling larval phenotype
Labeling larval phenotype Larvae should be physically labeled such that their phenotype is clearly identifiable (e.g. by cutting their tails bluntly vs. at an angle, pulling off pec fins, etc. We have not attempted removing an eye because we were concerned this would affect registration to the 3D atlas).
All larvae from all phenotypes should be fixed at the same time and stained in the same tube to All larvae from all phenotypes should be fixed at the same time and stained in the same tube to minimize artifacts. minimize artifacts.

Fixation
Fix larvae in 4% PFA in PBS + 0.25% Triton (PBS-T), overnight (O/N) at 4⁰C. Note that with MAP-mapping, quick fixation is essential due to the rapid kinematics of ERK phosphorylation, whereas since this protocol only uses tERK, exact timing of fixation is not essential for success. We have found it best to use room-temperature PFA and fix at RT, NOT on ice, then move the larvae to 4 degrees after about 5-10 minutes. 5.1 Wash off PFA with 3 5-minute PBS-T washes. 5.2 Larvae can be stored for 1-2 weeks at 4 degrees in PBS (not PBS-T), or you may proceed directly to "preparation for immunostaining" section.
Preparation for Immunostaining 2h 6 Bleach larvae Bleach larvae (skip this step if you raised larvae in PTU--this is only to bleach pigment cells so brains can be imaged without obstruction). After this point, be very careful not to lose larvae in pipetting and washing steps. You may wish to use a glass pipette rather than plastic both for better visibility and to avoid larvae sticking to the sides.
Original PTU protocol: Note on PTU: We strongly encourage validating that the phenotype of interest is not affected by PTU before using it on larvae for this protocol. PTU is known to affect autophagy, eye development, and visual behaviors. If there is any doubt, it is better to raise larvae in normal embryo media and bleach them after fixation. 6.2 Incubate on a rocker at RT for about 10 minutes or 55 degrees for about 5 minutes, until eyes are light orange in color, then rinse the bleach immediately off with 2-3 quick PBS-T washes, followed by one 5 minute PBS-T wash. Larvae will continue to bleach until they are completely washed, so be careful to begin washing when they are slightly darker than needed. Remove blocking solution from larvae and apply primary--I use 1000 uL of primary in a 1.5 mL Eppendorf tube for 50-100 larvae, but less can be used if necessary as long as the larvae remain completely submerged in primary while incubating overnight.
Note: this protocol should hypothetically work with any tagged expression construct, not just GFP-tagged constructs. Substitute antibodies as appropriate in order to visualize your expression construct.
10.2 Incubate overnight on a gentle rocker or rotator at 4 degrees C 10.3 Wash off primary 3X 15 min in PBS-T on rocker at room temperature. It is possible to save the primary and re-use it, but primary may diminish in concentration and suffer from repeated freeze-thaw cycles over time, so re-use primary antibody at your own risk. overnight or up to 2 weeks. I find that pre-incubating the larvae in a Vectashield:PBS mixture prevents them from shrinking / wrinkling, which occurs when they are moved straight from PBS to pure Vectashield. Generally speaking, many storage and/ or mounting methods are likely acceptable, so long as they preserve the shape of the brain.
Imaging 13 Imaging Imaging Before imaging of experimental larvae, it is highly advisable to prepare test larvae and attempt to register brains to the ZBrain registration image using your imaging settings. For example, the original Randlett et al. paper uses a 20X water immersion lens for imaging. We used a 20X air lens at 0.8X digital zoom. We also tested a 10X air lens with 1.6 digital zoom, but were not successful. You may have to modify this section heavily depending on your microscope--ultimately, the highest priority is to find settings that allow you to reliably register your brains to the reference brain. Consult the reference brain when choosing your own settings. In particular, make sure that the deepest parts of the brain are still visible.
13.1 Mount Vectashield-soaked larvae in 1.1%-1.25% low-melt agarose in a glass-bottomed petri dish, dorsal side down. Multiple larvae can be mounted in the same dish for higher efficiency.
It is IMPERATIVE that the larvae are mounted with no tilt, either left to right or front to back. Even small amounts of tilt will compromise later registration to the reference brain. When in doubt, it is better to very gently unmount and re-mount a larva than to proceed with a tilted specimen. 13.3 For our experiments, we used a Zeiss 880 microscope with a 20X 0.8 NA air lens at 0.8 zoom. The "tile" function of Zen was used to capture and stitch together two images, one including the forebrain, midbrain, and rostral part of the hindbrain and the other including the caudal hindbrain and anterior spinal cord.
Step size was 2 microns. A brain usually comprised around 130-150 slices. Laser intensity and gain were calibrated such that the brightest neurons in the brain were saturated, because otherwise signal in the dimmest neurons was lost. (Note that saturated pixels exist in some portions of the reference brain as well--it is best to attempt to match the staining of the reference brain as closely as possible.) 3-5 larvae were inspected before final settings were chosen, due to the variability in brightness of the GFP signal between brains. Ideally, the full range of each channel should be utilized. Once settings were determined, the same imaging settings were used for every brain in a staining batch. Images were saved in 8-bit, because they will be downsampled to 8-bit at a later step anyway.
. For a sense of exactly which parts of the fish to image, see the reference brain. This image includes the entire forebrain and olfactory pits all the way back to the pectoral fins. Neglecting to include parts of the brain and spinal cord in your image that are included in the reference brain, or including regions that are not included in the reference brain, can lead to stretching problems with the registration.

Zeiss LSM880
13.4 After imaging, place each individually-identified larva in a well of a genotyping plate, keeping careful track of which larva corresponds to which image and phenotype.
14 Once imaging is complete, genotype the larvae according to your own protocol.
Another option is to genotype larvae before phenotyping and fixing them. To pre-genotype live zebrafish larvae, we recommend the protocol by Zhang et al (2020). If you use this protocol, you must somehow mark which genotype the larvae have, just as you marked which phenotype they have. Note that this prevents you from performing the imaging part of the protocol blind to genotype.
When we enriched our samples for mutants, we used their protocol with some modifications, described here. Briefly, 2 dpf larvae were dechorionated by pretreating with pronase. Larvae were rinsed 3X in DNA collection buffer with tricaine, placed in DNA collection solution, and incubated at 37 degrees for 30 minutes without shaking. Supernatant solution was mixed with lysis buffer and incubated at 95 degrees for 5 minutes, while embryos were returned to E3 in individual wells. Supernatant solution was then genotyped using proprietary KASP primers from LGC Genomics. Note that KASP primers often work well with very small amounts of gDNA-other genotyping protocols, particularly those that require more gDNA, may not succeed. 15.2 Split channels using Colors Split channels 15.3 If you imaged on the Zeiss 880 with the dorsal side of the fish closest to the coverslip, you must flip Z orientation using Transform Flip Z (as stack number goes higher, you must approach the dorsal side of the brain-look at the reference brain to be sure you've got it right. If your brain is not in the same orientation as the reference brain, the registration will fail.) 15.4 Save individual channels as .nrrd files with _01 suffix for the tERK channel and _02 for the GFP (or other marker of your expression construct) channel.
Registration to the reference brain 16 Registration to reference brain CMTK Registration Runner was developed by Sándor Kovács. CMTK was developed by Torsten Rohlfing. Munger was developed by Greg Jefferis. Parameters for registering zebrafish brains to reference brain were determined by Owen Randlett. Reference zebrafish brain image was taken by Owen Randlett and is hosted on FishExplorer, a website maintained by the Engert lab.

For alternative instructions using the command line, see Randlett et al. (2015)
In order to facilitate this step for those who are not comfortable using the command line, we strongly recommend using the CMTK Registration Runner GUI by Sándor Kovács. There are detailed instructions to install and use this program at the link below. Install the version appropriate for your operating system. https://github.com/sandorbx/Fiji-CMTK-registration-runner-GUI#readme 16.2 Once installation is finished, register brains. Note that these instructions are for a Windows user; Mac / Linux users will need to modify. Begin by opening MobaXterm, the Linux emulator you just installed from the link above.

In the left side menu, click WSL-Ubuntu-20.04. This should open a new tab in MobaXterm with the
header "WSL-Ubunut-20.04." In that tab, type pcmanfm and press enter.
A screenshot of what MobaXterm will look like just before pressing "enter." 16.4 A new window will appear. Click on Fiji.app and then click on ImageJ. In the new dialog box that appears, click on "Execute." You don't need to click "Execute in 16.6 You will need to download the reference brain from the ZBrain 2.0 atlas before you can register your brains. To do this, go to this link: https://zebrafishatlas.zib.de/downloads. On the right side, "Other", there will be a button for "Reference brain." Move the reference brain file to an easy-to-access place in your file structure, but do not put it in the same folder as the brains that you will be registering.
16.7 For the field "CMTK library with Munger" navigate to the file Fiji.app/lib/cmtk_munger_wsl_linux and click "Select." You will only have to do this once.
Navigating to Fiji.app/lib/cmtk_munger_wsl_linux 16.8 For "reference brain (file)" navigate to the reference brain image file.
16.9 For "images to register (directory)" navigate to the folder in which all of your brain images have been saved. They should be .nrrds with the suffix _01 for the tERK file and _02 for the GFP file.
For "output selection" we recommend making a new folder for your registered brains. In CMTK Registration Runner, this translates to: a run affine transformation -CHECK w run warp transformation -CHECK c channels for registration -CHECK the number of channels in your images r run reformat on those channels -CHECK T (threads) default auto -Number of compute threads to use --user's choice, depends on computer's capabilities X (exploration) 52 C (coarsest) 8 R (refine) 3 G (grid spacing) 80 Accuracy 1.0

16.10
Click "OK" to run your registration. This may take some time (e.g. hours), depending on how many brains you are registering and how many threads you are using.
A screenshot of parameters entered into CMTK Registration Runner 16.11 Once the brains are done registering, open the registered brains in Image J along with the reference brain. Carefully compare the tERK channel in the registered brain with the reference brain. If the brain has not registered correctly, there are two options. The first is to exclude it from the analysis. The second option is to take all of the incorrectly-registered brains from a single batch and register them to a brain from the same batch that did register correctly, as an intermediate step, then reregister all of those brains to the reference brain. This sometimes succeeds if the tERK staining pattern varies only slightly from the reference brain's, likely due to batch effects in staining. If the fish were poorly positioned during imaging, it is unlikely they can be saved.
16.12 Make sure your registered brains are in their own folder with no other files or subfolders in it. At this point, you can set aside the registered tERK channels (with the suffix _01). All future steps are for the GFP channels (with the suffix _02) only.
Smooth and reformat your registered brains using the PrepareStacksForMAPMapping.ijm macro in FIJI. This macro can be found at Owen Randeltt's github page, https://github.com/owenrandlett/Z-Brain, or here: PrepareStacksForMAPMapping.ijm PrepareStacksForMAPMapping.ijm . Make sure that the maximum pixel intensity ("max =" __, line 3) is correct--if you're using 8-bit images, the correct number is 256. Once you click "Run" you will be asked to direct the computer to the file containing your .nrrd output images as well as a new folder where the smoothed and reformatted images should go. Running this step should be substantially faster than registration (e.g. 15 minutes or less).
After this step, in your output folder there should be a new .tiff file corresponding to each of your registered .nrrd files. If you've allowed the default naming of each step, your filenames should be something like "Ref20131120pt14pl2_Fish1_02_warp_m0g80c8e1e-1x52r3.nrrdGauSmooth.tiff" The ImageJ macro "PrepareStacksforMAPMapping.ijm" Quantification of signal in each brain region 17 This step, quantification of GFP signal in each brain region, requires the use of Matlab. https://www.mathworks.com/products/matlab.html You will need to download the file 'AnatomyLabelDatabaseDownsampled.hdf5' from https://zebrafishatlas.zib.de/downloads (under the "Others" header at the far right).
You will need to download the file 'MaskDatabaseDownsampled.hdf5' here: MaskDatabaseDownsampled.mat MaskDatabaseDownsampled.mat Since the Matlab section of this code will likely only take <1 hour to run, assuming you have gathered all the necessary information in advance, it may be possible to run on a shared computer or using the Matlab free trial, if purchasing the program is not an option.
The code in this section was modified from code originally written by Owen Randlett.  (2015). Whole-brain activity mapping onto a zebrafish brain atlas.. Nature methods.

The function
QuantifySignalMultipleBrains.m will take as an argument your chosen output file name.
Within the function, you will point Matlab to a folder containing the .tiff files of your aligned, smoothed brains. The function also requires the files 'AnatomyLabelDatabaseDownsampled.hdf5' and "MaskDatabaseDownsampled" to on the computer, and you will be asked to direct Matlab to the folder containing these files.
For those new to Matlab --your command will look like this: QuantifySignalMultipleBrains("YourDesiredFileNameHere"). Don't forget the quotes! There is no need to have a ".csv" in the filename.
The function will loop through all of your .tiff files, asking you to input values for key variables that will go into the column name corresponding to that brain. The output of the function will be a .csv file. Each column of the file corresponds to a single brain. Each row corresponds to the signal intensity in one of the 293 brain regions in the anatomy database.
\Each column header contains some metadata about the fish that you provided.
If a given piece of metadata is not provided, its space is filled with an "x". If the metadata is out of order or not marked by an "x" the R code in the next step will not function as intended.
Screenshot from Excel showing example output from the QuantifySignalMultipleBrains function. In the left column, "ROIname," (Region of Interest Name) are the names of brain regions from the anatomy database. In the right column, we have a brain name that describes, from left to right, the filename, the genotype, the phenotype ("LLC" refers to a fish that performed predominantly long-latency C-bends on a decision-making assay), the date the fish was stained, and the date the fish was imaged. Values in the cells are raw GFP signal intensity.
If you enter anything incorrectly while the Matlab program is looping, you can always make a note of it and fix the mistake in the column headers later. It is important not to have any typos in these headers, as they will be used as variables for analysis once the data is imported into R.
1h During this step, you may choose to modify the information collected about each brain by editing the Matlab code (for example, if you are working with a double mutant, you will need to add a step where you document the genotype for Gene A and for Gene B). Keep in mind that you will have to tweak the file import and data processing in R if you choose to do this.
Import data into R 1h 18 This section requires RStudio, which is open source and free.
For your own analysis, you may either choose to modify the example analysis that we present here, or create your own R script or RMarkdown document based upon these steps and our RMarkdown example. Typically an R script will be simpler for a new user of R to work with, but an RMarkdown document can be used to generate reports in html or .pdf format. You can always start by creating an R script, then adapting the code into an RMarkdown format once it is working.
For new R users, we recommend referring to the book R for Data Science R for Data Science by Hadley Wickham and Garret Grolemund. It is available free as an ebook (with author permission) here. If a hard copy is desired, one can be purchased here.
For those interested in learning more about multivariate analysis and its implementation in R, we recommend referring to An Introduction to Statistical Learning An Introduction to Statistical Learning. The .pdf is available for download (with author permission) here. If a hard copy is desired, one can be purchased here. StatQuest (https://statquest.org/) is also an excellent free online resource for those with little background in multivariate statistics.
18.1 I have provided an example analysis in the following RMarkdown files.

MAVEN_Whole_Project_220222.html
For examples of data analysis, see the RMarkdown files (html file is an interactive "tour" of analysis and figures, while RMarkdown file contains editable code for performing all the analyses and generating all the figures in the .html file).
Before you can analyze your own data, you will need to install packages, load some custom functions (attached below) and modify the data import section of this code. Instructions for each step are provided below. 18.4 Next, following the R code, we load into R the raw brain region signal intensity data that we generated using Matlab.
We assume that each sample name contains some metadata about the fish: namely, its number, the pair it came from, its genotype, phenotype, the date it was collected (e.g. the date behavior was performed) and the date it was imaged. Thus, the FishName column contains entries that look like this:

Fish10_41_mut_LLC_210314_210320
In the Matlab code, if a given piece of metadata is not provided by the user, its space is filled with an "x". If the metadata is out of order or not marked by an "x" the R code will not function as intended.
You will need to modify the code to point it to your Matlab output file.
The custom function "tidyImportedDataUnderscore.R" contains the code that transposes the imported file and splits the long FishName into the columns "FishNum," "pair," "geno," "pheno," "collected," and "Imaged." If you altered the Matlab code for generating descriptive column headers for larvae, you should modify lines 23-29 of the tidyImportedDataUnderscore function accordingly. Each unique descriptor of your data should get its own column.
RStudio console showing data after the tidyImportedDataUnderscore function has been applied. Besides transposing the data, this function also breaks up the long fish description into individual variables, which can now be grouped by and sorted for. If alterations were made to experimental design earlier in the pipeline, this function must also be modified.
18.5 We also load in a list of the names of all 293 brain regions plus corresponding abbreviations for these brain regions. We often use the abbreviated forms in our graphs because some of the true anatomical brain regions are long enough to interfere with axis labels for figures. If you want to move between different forms of brain region names, you can use a join function to combine your data with this key, then the dplyr select function to retain only the form of the names that you want to work with.
293_BrainRegions_Translator.xlsx 293_BrainRegions_Translator.xlsx 18.6 In the code Section 7.1 Correlational analysis: which other regions correlate with the Section 7.1 Correlational analysis: which other regions correlate with the DCR6? DCR6? of the R code the user generates a list of brain regions where signal is most highly correlated with another brain region. This can be used to identify alternative candidate phenotype-causative regions if the first region identified by LASSO regression fails to validate. The R code exports a .csv file in a format that can be read by the Matlab function CustomBrainRegionStack.m CustomBrainRegionStack.m , which can be used to generate customized images showing the anatomical arrangement of the set of brain regions specified. A detailed description of how to use that function is contained in its header.
18.7 From this point forward, use the comments and instructions within the RMarkdown code chunks to guide your own analysis. After every step, be sure to stop and think about the results and what they imply for future steps. It is likely that every analysis will be slightly different, since the underlying structure of the variation in gene expression will be slightly different. This code should provide a useful framework, but a good grasp on multivariate statistics is also essential to help interpret results as you go. If you need help, refer to resources in the note of Step 18. Good luck!